Similarity Measures for Multi-valued Attributes for Database Clustering

نویسندگان

  • TAE-WAN RYU
  • CHRISTOPH F. EICK
چکیده

This paper introduces an approach to cope with the representational inappropriateness of traditional flat file format for data sets from databases, specifically in database clustering. After analyzing the problems of the traditional flat file format to represent related information, a better representation scheme called extended data set that allows attributes of an object to have multi-values is introduced, and it is demonstrated how this representation scheme can represent structural information in databases for clustering. A unified similarity measure framework for mixed types of multi-valued and single-valued attributes is proposed. A query discovery system, MASSON that takes each cluster is used to discover a set of queries that represent discriminant characteristic knowledge for each cluster. INTRODUCTION Many data analysis and data mining tools, such as clustering tools, inductive learning tools, statistical analysis tools, assume that data sets to be analyzed are represented as a single flat file (or table) in which an object is characterized by attributes that have a single value. Person Purchase Joined result

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SHAPLEY FUNCTION BASED INTERVAL-VALUED INTUITIONISTIC FUZZY VIKOR TECHNIQUE FOR CORRELATIVE MULTI-CRITERIA DECISION MAKING PROBLEMS

Interval-valued intuitionistic fuzzy set (IVIFS) has developed to cope with the uncertainty of imprecise human thinking. In the present communication, new entropy and similarity measures for IVIFSs based on exponential function are presented and compared with the existing measures. Numerical results reveal that the proposed information measures attain the higher association with the existing me...

متن کامل

A Unified Similarity Measure for Attributes with Set or Bag of Values

Most similarity measures assume that each attribute for an object has a single value. However, there are many attributes that have a set or bag of values. This paper first discusses various similarity measures for single-valued attributes, group similarity measures, then proposes a unified framework for similarity measures that can cope with data sets with mixed types of attributes that may hav...

متن کامل

Attribute Similarity and Event Sequence Similarity in DataMiningPirjo

In data mining and knowledge discovery, similarity between objects is one of the central concepts. A measure of similarity can be user-deened, but an important problem is deening similarity on the basis of data. In this thesis we consider two kinds of similarity notions: similarity between binary valued attributes and between event sequences. Traditional approaches for deening similarity betwee...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

Algorithm for defuzzification of multi-valued taxonomic attributes in similarity-based fuzzy relational databases

In this work we investigate our potential ability to discover knowledge from multi-valued attributes (often referred in literature on fuzzy databases as fuzzy collections [1-3]), that have been utilized in fuzzy relational database models [4-7] as a convenient way to represent uncertainty about the data registered in the data tables. We present here implementation details and extended tests of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998